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Abstract 

In this work, we address the following matrix recovery problem: suppose we are given a 
set of data points containing two parts, one part consists of samples drawn from a union 
[ of multiple subspaces and the other part consists of outliers. We do not know which data 

points are outliers, or how many outliers there are. The rank and number of the subspaces 
are unknown either. Can we detect the outliers and segment the samples into their right 
subspaces, efficiently and exactly? We utilize a so-called Low-Rank Representation (LRR) 
method to solve this problem, and prove that under mild technical conditions, any solution 
to LRR exactly recover the row space of the samples and detect the outliers as well. Since 
the subspace membership is provably determined by the row space, this further implies 
that LRR can perform exact subspace segmentation and outlier detection, in an efficient 
way. 

■ Keywords: Low-Rank Modeling, Subspace Segmentation, Outlier Detection, Robust 

Estimation, Nuclear Norm Rcgularization 



1. Introduction 

This paper is about the following problem: suppose we are given a data matrix X, each 
column of which is a data point, and we know it can be decomposed as 

X = X + C , (1) 

where Xq is a low-rank matrix with the column vectors drawn from a union of multiple 
subspaces, and Co is a column-sparse matrix that is non-zero in only a fraction of the 
columns. Except these mild restrictions, both components are arbitrary. In particular we 
do not know which columns of Co are non-zero, or how many non-zero columns there are. 
The rank of Xq and the number of subspaces are unknown either. Can we recover the row 
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space of Xq, and the identities of the non-zero columns of Cq, efficiently and exactly? If so, 
under which conditions? 

This problem is motivated from the subspace segmentation problem, an important prob- 
lem in ma chine learning and computer vision that attracts tremendous amount of research 
effort fe.g..[Costeira and KanadelJl998l:lEldar and Mishalil2009l:lElhamifar and Vidall2009l: 



Fischler and Bollesl . ll98ll : lGearlll998l : lGruber and Weissl . l2004l : lLiu et alJ . l2010cH bl: lRao et all 
2nid : IVidai ~ 



2011 



, and many others). As often in computer vision and image pro cessing ap- 



plica t ions, one observes data points drawn from the union of multiple subspaces (|Ma et al. 



m H). The goal of subspace segmentation is to segment the samples intolheiT^- 
spective subspaces. Indeed, subspace segmentation can be regarded as a generalization 
of Principal Component Analysis (PCA) that has only one subspace. As such, similar to 
PCA, segmentation algorithms can be sensitive to the presence of outliers. In fact, because 
of the coupling between segmentation and outlier detection, robust subspace segmentation 
appears to be a challenging problem, and very few methods with theoretic guarantees, if 
any, have been proposed in literature. 

Our main thrust, as we show below in Section [2.31 is the fact that the row space of the 
data samples Xq determines the correct segmentation. Thus, both subspace segmentation 
and outlier detection can be transformed into solving Problem ([I]), where the column support 
of Cq indicates the outliers, and the row space of Xq gives the segmentation result of the 
"authentic" samples. To this end, we an alyze the following convex optimization problem, 
termed Low-Rank Representation (LRR) ( Liu et all 120101 



mm 

z,c 



|Z||»+A||C|| 2 ,i, s.t. X = XZ + C, 



(2) 



wher e ||-|| denotes the sum of the singular values, also known as the nuclear norm (jFazel 



2002J), the trace norm or the Ky Fan norm; ||-|| 2 1 is called the £2,1 norm and is defined as 



the sum of £2 norms of the columns of a matrix, and the parameter A > is used to balance 
the effects of the two parts. 

Using the nuclear-norm base d approach to tac kle the subspace segmentation problem 
is not a completely new idea. In iLiu et all (|2010bh . the authors showed that if there is no 
outlier, then the formulation 
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s.t. X = XZ, 



exactly solves the subspace segmentation problem. They further conjectured that in the 
presence of corruptions, the formulation ([2]) may be helpful. However, no theoretic analysis 
was offered. In contrast, we show that under mild conditions, both the row space of Xq and 
the column support of Cq can be recovered by solving Problem ([2]). Thus, one can simulta- 
neously perform subspace segmentation and outlier detection in an efficient way. While our 
analysis shares si milar features as prev i ous work in Rob ust Principal Component Analysis 
(RPCA) including ICandes et alJ (|2009h : IXu etHI ifcoid ). it is complicated by the fact that 
the variable Z is left-multiplied by a dictionary matrix X, and (perhaps more significantly) 
by the fact that the dictionary itself is contaminated by outliers. Also, it is worth noting 
that the problem of recovering row space with column-wise corruptions essentially cannot 
be addressed by existing RPCA methods (ITorre and Black! . l200ll : Ku et all l2010l ). which 



are designed for recovering the column space with column-wise corruptions. In this regard, 
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LRR also has a unique role in solving the RPCA problem under the context of corrupted 
features (i.e., row- wise corruptions); that is, one can recover the column space with row- wise 
corruptions by solving the following transposed version of (J2]): 



min||Z||* + A||C|| 2 ,i, s.t. X 1 = X 1 Z + C. 

z ,c 



As discussed above, existing RPCA methods (e.g., Xu et al. . 2O10l ) that focus on recovering 



the column space with column-wise corruption are fundamentally unable to address this 
problem. 

The remainder of this paper is organized as follows. Section [2] introduces some prelim- 
inaries for reading this paper. The main results of this paper are presented and proven 
in Section [3] and Section [U respectively. Section [5] presents the experimental results and 
Section [6] concludes this paper. 

2. Preliminaries 

For easy of reading, we introduce in this section some preliminaries, including the usage 
of mathematical notations, the concept of independent subspaces, the role of row space in 
subspace segmentation, and some previous results about recovering row space by LRR. 

2.1 Summary of Notations 

Capital letters such as M are used to represent matrices, and accordingly, [M]i denotes the 
i-th column vector of M. Letters U, V , I and their variants (complements, subscripts, 
etc.) are reserved for column space, row space and column support, respectively. There are 
four associated projection operators we use throughout. The projection onto the column 
space, U, is denoted by Vjj and given by Vjj(M) = UU T M, and similarly for the row space 
Vy(M) = MVV T . Sometimes, we need to apply Vy on the left side of a matrix. This 
special operator is denoted by Vy and given by Vy(-) = VV T (-). The matrix Vi(M) is 
obtained from M by setting column [M]i to zero for all i X. Finally, Vt is the projection 
to the space spanned by U and V, and given by Vt(-) = Vu(-) + Vy(-) — VuVy{-). Note 
that Vt depends on both U and V, and we suppress this notation wherever it is clear 
which U and V we are usi ng. The complem entary operators, Vjj±, Vy±, V T ±, Vy± and Vx c 
are defined as usual (e.g., Xu et al. . 2O10l ). The same notation is also used to represent a 



subspace of matrices: e.g., we write M G Vjj for any matrix M that satisfies Vjj(M) = M. 
Five matrix norms are used: \\M \\^ is the nuclear norm, ||M|| 2 1 is the sum of £2 norms of the 
columns [M]j, ||Af|| 2oo is the largest £2 norm of the columns, and ||Af||jr is the Frobenius 
norm. The largest singular value of a matrix (i.e., the spectral norm) is ||M||, and the 
smallest positive singular value is denoted by a m i n (M). The only vector norm used is ||-|| 2 , 
the £2 norm. Depending on the context, I is either the identity matrix or the identity 
operator, and ej is the i-th standard basis vector. 

We reserve letters X, Z, C and their variants (complements, subscripts, etc.) for the 
data matrix (also the dictionary), coefficient matrix (in LRR) and outlier matrix, respec- 
tively. The SVD of Xq and X are Uq^oVq' and Ux^xV^, respectively. We useXo to denote 
the column support of Co, d the ambient data dimension, n the total number of data points 
in X, 7 = |Xo|/n the fraction of outliers, and rg the rank of Xq. For a convex function 
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/ : H mxm ' -> M, we say that Y is a subgradient of / at M, denoted as Y G df(M), if and 
only if /(M') > /(M) + (M'-M,F),VM'. We also adopt the conventions of using span (M) 
to denote the linear space spanned by the columns of a matrix M, using y £ span (M) to de- 
note that a vector y belongs to the space span (M), and using Y E span (M) to denote that 
all column vectors of Y belong to span (M). A list of notations can be found in Appendix iBl 
for convenience of readers. 

2.2 Independent Subspaces 

The concept of independence will be used in our analysis. Its definition is as follows: 

Definition 1 A collection of k (k>2) subspaces {S±,S2, - ■ ■ ,Sk} are independent ifSiCi 
>: ; ,;^ I"}./"/'/ ,k. 

A closely related concept is pairwise disjointness, which means there is no intersection 
between any two subspaces, i.e., Si n Sj = {0}, Vi / j. It is easy to see that when there are 
only two subspaces (i.e., k = 2), independence is equivalent to pairwise disjointness. On 
the other hand, when k > 2, independence is a sufficient condition for pairwise disjointness, 
but not necessary. 



2.3 Relation Between Row Space and Segmentation 

The subspace memb erships of the authent i c sam p les are deter mined by the row space Vq. 
Indeed, as shown in Costeira and Kanade ( 19981 ): Gearl ( 19981 ). when subspaces are inde- 
pendent, VqVq forms a block-diagonal matrix: the (i, j)-th entry of VqV q t can be non-zero 
only if the i-th and j-th samples are from th e same subspace. Hence, this matrix, termed 
as Sha pe Iteration Matrix (SIM) (IGearl. 119981). has been wid ely used for subspace segmen- 



tation (jCosteira and Kanadd . Il99a : IGearl . Il998l : IVidall . 1201 ll ) . Previous approaches simply 
compute the SVD of the data matrix X = Ux^-xV^ and then use | | for subspace 
segmentation. However, in the presence of outliers, Vx can be far away from Vo and thus 
the segmentation using such approaches may be inaccurate. In contrast, we show that LRR 
can recover VqVq even when the data matrix X is corrupted by outliers. 

In practice, the subspaces may not be independent. As one would expect, in this case 
VqVq is not necessarily block-diagonal, since when the subspaces have nontrivial intersec- 
tions, some samples may belong to multiple subspaces simultaneously. Nevertheless, re- 
covering VqVq is still of interest to subspace segmentation. Indeed, numerical experiments 
have shown that, as long as t he subspac e s are p airwise disjoint (but not independent), VqVq 



is close to be block-diagonal iLiu et al.l (|2010al ). as exemplified in Figure [TJ Note that the 



analysis in this paper focuses on when VqVq can be recovered, and hence does not rely on 
whether or not the subspaces are independent. 



2.4 Relation Between Row Space and LRR 



To better illustrate our intuition, we begin with the "ideal" case where there is no outlier in 
the data: i.e., X = Xn and Cp = 0. T hus, the LRR problem reduces to min^ \\Z\\^ s.t. Xq 



T 



i.e. 



XqZ. As shown in ILiu et alJ (|2010ah . this problem has a unique solution Z* = VqV^ 
the solution of LRR identifies the row space of Xq in this special case. Thus, when the data 
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Figure 1: An example of the matrix VqVq computed from dependent subspaces. In this 
example, we create 11 pairwise disjoint subspaces each of which is of dimension 20, 
and draw 20 samples from each subspace. The ambient dimension is 200, which 
is smaller than the sum of the dimensions of the subspaces. So the subspaces are 
dependent and VqV t is not strictly block-diagonal. Nevertheless, it is simple to 
see that high segmentation accuracy can be achieved by using the above similarity 
matrix to do spectral clustering. 



are contaminated by outli ers, it is natural t o consider Problem ([2]). The following lemma, 
implied by Theorem 4.3 of Liu et al. ( 2010a! ) . sheds insight on when LRR recovers the row 
space. 

Lemma 1 For any optimal solution (Z*,C*) to the LRR problem ([2]), we have that 



Z* £ V 



V X - 



i.e., Z* £ span(X T ), where Vx is the row space of X. 

The above lemma states that the optimal solution (with respect to the variable Z) to LRR 
always locates within the row space of X. This provides us an important clue on the 
conditions for recovering VqVq by Z* . 



3. Settings and Results 

In this section we present our main result: under mild assumptions detailed below, LRR 
can exactly recover both the row space of Xq (i.e., the true SIM that encodes the subspace 
memberships of the samples) and the columns support of Co (i.e., the identities of the 
outliers) from X. 

While several articles, e.g., Candes and Recht ( 20091 ): Candes et al. ( 20091 ) : Xu et al 



told ), rave shown that the nuclear norm regularized optimization problems are powerful 
in dealing with corruptions including missed observations and outliers, it is considerably 
more challenging to establish the success conditions of LRR. This is partly due to the 



5 



Liu, Xu and Yan 



bilinear interaction between the corrupted matrix X = Xq + Co and the unknown Z in the 
equation Xq+Cq = (Xq+Cq)Z+C ', which is essentially a matrix recovery task under a noisy 
dictionary, a topic not studied in literature to the best of our knowledge. Moreover, our 
goal is to recover row space from column- wise corruptions. This is a new task not addressed 
by pre yious RPCA and matrix recovery methods that mainly focus on recovering column 



space (ICandes et all |2009i; ICandes and Plan! 120101: ICandes and Rechtl . 12009k iDevlin et al 



1981; Torre and Black 



2001: 



Wright et all . 



2009; 



and hence calls for new 



analysis tools. 



3.1 Problem Settings 

We discuss in this subsection three conditions sufficient for LRR to succeed. Note that 
these conditions also reveal how the outliers and samples are defined in LRR. 

3.1.1 A Necessary Condition for Exact Recovery 

Suppose (Z*,C*) is an optimal solution to ([2]), then Lemma [T] concludes that the column 
space of Z* is a subspace of Vx- Hence, for Z* (or a part of Z*) to exactly recover Vq, Vq 
must be a subspace of Vx, i.e., the following is a necessary condition: 

V €Vfr x . (3) 

To show how the above assumption can hold, we establish the following lemma which show 
that ([3]) can be satisfied when the outliers are independent to the samples (the proof is 
presented in Appendix lA.ip . 

Lemma 2 //span (Co) and span(Xo) are independent to each other, i.e., span (Co) n 
span(Xo) = {0}, then ([3]) holds. 



3.1.2 Relatively Well-Definedness 

As we discussed earlier, one technical challenge to the analysis of LRR comes from the 
bilinear interaction between the corrupted matrix X = Xq + Co and the unknown Z in the 
equation X = XZ + C. In fact, because the (outlier corrupted) data matrix X is used as 
the dictionary, certain conditions to ensure that the dictionary is "well-behaved" appear to 
be necessary. In particular, we need the following relatively well-defined (RWD) condition. 



Definition 2 The dictionary X generated by X = Xq + Co, with SVD X = Ux^xVx an< ^ 
Xq = UqT>qVq T , is said to be RWD (with regard to Xq) with parameter j3 if 

IK'^.II < (4) 

For LRR to succeed, the RWD parameter (3 can not be too small. Notice that f3 can be 
loosely bounded by 

^TyT, 

cond{X ) 



ti 
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Figure 2: Plotting the RWD parameter p = 1/(||X|| \\T,^ V$V \\) as a function of the rela- 
tive magnitude ||Co||/||-Xo||. These results are averaged from 100 random trials. 
In those experiments, the outlier fraction is fixed to be 7 = 0.5, and the outlier 
magnitude is varied for investigation. The matrices Xq and Cq are generated in 
a similar way as in Section [SJ 



where cond(X) = \\X\\/a m i n (X) is the condition number of X. This implies that (3 = 1 
when X is "perfectly well-defined" (e.g., ro = 1 and Co = 0). However, when X is severely 
singular (e.g., due to the presence of outliers), this bound is too loose to guarantee RWD 
holds. In this case, we can apply the following bound, which essentially states that the 
RWD parameter f3 is reasonably large when the outliers are not too large. See Appendix 
IA.2l for the proof. 

Lemma 3 //span (Co) and span (Xq) are independent to each other, then 



sin(0) 



cond(X )(l + JIM' 



where cond(Xo) = ||Xo||/o" m j n (-Xo) is the condition number of Xq, and 9 > is the smallest 
principal angle between span (Co) and span (Xq). 

Remark 1 To ensure that f3 is reasonably large, the above lemma states that the outlier 
magnitude (comparing to the sample magnitude) should not be too large. This is verified by 
our numerical experiments, as shown in Fig\^ 



Remark 2 To ensure that f3 is reasonably large, the above lemma also states that the prin- 
cipal angle 9 should be notably large; that is, the outliers in LRR are restricted to the data 
points which are notably far way from the underlyin g subspace s. This conclusion is con- 



Liu et al. ho 10 A ), which shows that 



sistent with the experimental observations reported in 
LRR can distinguish between the outliers ( corresponding to large 9 ) and the corrupted sam- 
ples ( corresponding to small 9 ), where a corrupted sample is sampled from the subspaces, 
but does not exactly lie on the underlying subspaces due to the corruptions. 
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3.1.3 Incoherence 

Finally, as now standard (jCandes and Rechtl . l200Sl : ICandes et all 120091 : IXu et all l20ld >. 



we require the incoherence condition to hold, to avoid the issue of un-identifiability. As an 
extreme example, consider the case where the data matrix Xq is non-zero in only one column. 
Such a matrix is both low-rank and column-sparse, thus the problem is unidentifiable. To 
make the problem meaningful, the low-rank matrix Xq cannot itself be column-sparse. This 
is ensured via the following incoherence condition. 

Definition 3 The matrix X G R dXn with SVD X = U E V^ , rank(A" ) = r and (1-7)71 
of whose columns are non-zero, is said to be column-incoherent with parameter ji if 

max||^e 4 || 2 <-^V' (5) 
1 (1 — 7jn 

where {ej} are the standard basis vectors. 

Thus if Vq has a column aligned with a coordinate axis, then fj, = (1 — 7)71/7*0. Similarly, 
if Vo is perfectly incoherent (e.g., if ro = 1 and every non-zero entry of Vo has magnitude 
l/i/(l — j)n ), then [i = 1. 

3.2 The Main Result 

In the following theorem, we present our main result: under mild technical conditions, any 
solution (Z*, C*) to exactly recovers the row space of Xq and the column support of Cq 
simultaneously. 

Theorem 1 Suppose a given data matrix X is generated by X = Xq + Cq, where Xq is 
of rank ro, X has RWD parameter (3 and Xq has incoherence parameter fj,. Suppose Cq is 
supported on columns. Let 7* be such that 

324* 



1-7* 49(11 + 4/3) Vo' 

then LRR with parameter A = ^-^-j== strictly succeeds, as long as 7 < 7* and ([3]) holds. 
Here, LRR "strictly succeeds" means that any optimal solution [Z*,C*) to © satisfies 

U*(U*) T = V V T and T =T , (7) 

where U* is the column space of Z* , and I* is the column support of C* . 

Theorem Q] indeed states that the fraction of outliers that LRR can successfully handle, 
namely 7*, depends on the rank ro (the lower the better), the RWD parameter (3 (the 
larger the better), and the incoherence parameter fj, (the smaller the better). 

Recall that as discussed in the introduction, LRR can be used to solve PCA tasks with 
feature-wise corruption by solving a transposed version of Problem (J2D- Hence, Theorem Q] 
also provides a theoretical guarantee in this setup. 

4. Proof of Theorem CD 

In this section, we present the detailed proofs of our main result, Theorem [TJ 
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4.1 Roadmap of the Proof 

In this subsection we provide an outline for the proof of Theorem [TJ The proof follows three 
main steps. 

1. Equivalent Conditions: Identify the necessary and sufficient conditions (called 
equivalent conditions), for any pair (Z',C) to produce the exact results (|7|). 

For any feasible pair (Z' , C) that satisfies X = XZ'+C, let the SVD of Z' as U'T,'V' T 
and the column support of C as I'. In order to produce the exact results (|T)), on the 
one hand, a necessary condition is that Vy (Z') = Z' and Vx {C) = C, as this is 
nothing but U' is a subspace of V$ and X' is a subset of Iq. On the other hand, it can 
be proven that Vy Q {Z') = Z' and Vi Q (C) = C are sufficient to ensure U'U' T = V V{f 
and I' =Xq. So, the exactness described in (JTj) can be equally transformed into two 
constraints: Vy (Z') = Z' and Vx (C) = C, which we will use to construct an oracle 
problem to facilitate the proof. 

2. Dual Conditions: For a candidate pair [Z\C') that respectively has the desired 
row space and column support, identify the sufficient conditions for (Z 1 , C) to be an 
optimal solution to the LRR problem (|2]). These conditions are call dual conditions. 

For the pair [Z\ C) that satisfies X = XZ' + C, V^{Z') = Z' and V Xo {C') = C, let 
the SVD of Z' as XJ'YlV lT and the column-normalized version of C' as H' . That is, 
column [H']i = iirgTrnn f° r an i £ ^o, and [H']i = for all i $lTq (note that the column 
support of C is Xq). Furthermore, define V T '{-) = Vu'(-) + V V '(-) - Vu'V V '(-)- With 
these notations, it can be proven that (Z' , C') is an optimal solution to LRR if there 
exists a matrix Q that satisfies 



Although the LRR problem (|2|) may have multiple solutions, it can be further proven 
that any solution has the desired row space and column support, provided the above 
conditions have been satisfied. So, the left job is to prove the above dual conditions, 
i.e., construct the dual certificates. 

3. Dual Certificates: Show that the dual conditions can be satisfied, i.e., construct 
the dual certificates. 

The construction of dual certificates mainly concerns a matrix Q that satisfies the 
dual conditions. However, since the dual conditions also depend on the pair (Z',C), 
we actually need to obtain three matrices, Z', C and Q. This is done by considering 
an alternate optimization problem, often called the "oracle problem". The oracle 
problem arises by imposing the success conditions as additional constraints in ([2]): 



While it is not practical to solve the oracle problem since Vq and Xq are both unknown, 
it significantly facilitate our proof. Note that the above problem is always feasible, as 



V T >{X T Q) = U'V 
V Xo (Q) = \H' 



\\X T Q-V T '{X T Q)\\ < 1 

||Q-^i (<9)lkoo<A. 



Oracle Problem: 



min||Z||* + A||C|| 2il 

X = XZ + C, (Z) = Z, V Io (C) = c. 
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(VqVq , Co) is feasible. Thus, an optimal solution, denoted as (Z,C), exists. Observe 
that because of the two additional constraints, (Z, C) satisfies (|7|). Therefore, to show 
Theorem [T] holds, it suffices to show that (Z , C) is the optimal solution to LRR. With 
this perspective, we would like to use (Z, C) to construct the dual certificates. Let the 
SVD of Z be UTiV T , and the column-normalized version of C be H. It is easy to see 
that there exists an orthonormal matrix V such that UV T = VqV t , where Vq is the 
row space of Xq. Moreover, it is easy to show that V^(-) = Vy (■), V v (-) = Vy(-), and 
hence the operator Vf defined by U and V, obeys Vf (•) = Vy Q {-) + Vy{-) — Vy^Py{-). 
Finally, the dual certificates are finished by constructing Q as follows: 

Qi = w{; (x T H), 

oo 

Q 2 = W^V XS Vy(l + Y,('Py'PT Vyy)Vy(X T H), 

i=l 

Q 4 U x Xx 1 v£(V V T + \X T H-Q 1 -Q 2 ), 
where Vx^x^x ^ s ^ rie SVD of the data matrix X. 
4.2 Equivalent Conditions 

Before starting the main proofs, we introduce the following lemmas, which are well-known 
and will be used multiple times in the proof. 

Lemma 4 For any column space U , row space V and column support X, the following holds. 

1. Let the SVD of a matrix M be UT,V T , then d\\M\\* = {UV T + W\V T iW) = 0, ||W|| < 
!}• 

2. Let the column support of a matrix M be I, then d\\M\\2i = {H + L\Vx(H) = 
H, [H]i = [M]i/\\[M]i\\2,Vi e l;Vx(L) = 0, \\L\\ 2>00 < 1}. 

3. For any matrices M and N of consistent sizes, we have Vx(MN) = MVx(N). 

4- For any matrices M and N of consistent sizes, we have VuVxiM) = VxVu{M) and 
V{;V x (N)=VxV v (N). 

Lemma 5 If a matrix H satisfies \\H \\2,oo ^ 1 an d is support on I, then \\H\\ < 

Proof This lemma is adapted from Xu et al. ( 201Cll ). We present a proof here for com- 
pleteness. 

Il-ffll = ||-f^ T || = max ||if T a;||2 = max |b 

||x|| 2 <l INI|2<1 

f 

\\*h< 



max /^[F],) 2 ^ El = V^I- 
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Lemma 6 For any two column- orthonomal matrices U and V of consistent sizes, we have 
||£^ T ||2,oo = maxj ||y T ej|| 2 . 

Lemma 7 For any matrices M and N of consistent sizes, we have 

||MJV|| 2 ,oo < ||M||||iV|| 2i0O , 
\{M,N)\ < ||M|| 2) oo||iV|| 2)1 



Proof We have 



MN\\ 2oo = max||MJVei|| 2 

= max||M[iV]i||2 < max ||M||||[Ar]j|| 2 = ||M|| max \\[N}i\\ 2 

i i i 

= ||M||||iV|| 2)00 . 



\(m,n)\ = i^mjmiK^mfiNUK^imMWhh 

i i i 

< VCmaxlltMliHaJll^illa = ||M|| 2iOC ||iV|| 2il . 



The exactness described in ([7]) seems "mysterious" . Actually, they can be "seamlessly" 
achieved by imposing two additional constraints in ([2]), as shown in the following theorem. 

Theorem 2 Let the pair (Z' , C) satisfy X = XZ' + C. Denote the SVD of Z' as U'T,'V' T , 
and the column support of C as I'. IfV^ (Z') = Z' andV Xo (C') = C , then U'U' T = V V^ 
and I' = Iq . 

Remark 3 The above theorem implies that the exactness described in ([7|) is equivalent to 
two linear constraints: Vy Q (Z*) = Z* and Vx (C*) = C* . As will be seen, this can largely 
facilitates the proof of Theorem [0 

Proof To prove U'U' T = VqVq , we only need to prove that rank (Z') > ro, as Vy Q (Z ! ) = Z' 
implies that V is a subspace of Vo- Notice that Vx^{X) = X$. Then we have 

X = Px§(X) =V XS (XZ' + C) =Vxc{XZ') 
= XVtc(Z'). 

So, r = rank (X ) = rank (XVz°(Z')) < rank (Vx§(Z')) < rank (Z 1 ). 

To ensure X' = Xq, we only need to prove that Xq n X' c = 0, since Vx {C) = C has 
produced I' C I . Via some computations, we have that 

V Xo (X ) = => U ^oVx (V T ) = 
^Vx (V T ) = 

^VoVx (V T ) = 0. (8) 



11 



Liu, Xu and Yan 



Also, we have 



V e Vh =► V T = V^V X VT 



v x ^ v o - v o v * v x 

T _ t/lt/Tt/i.t/T 



=>VoV 6 ( =V V 1 V X V X , (9) 

which simply leads to V V^V x V Xo (V x ) = Wlo OoO • Recalling ©, we further have 

^z (O = =► V V T V X V X( M) = V Vxo(V T ) = 

=> W T VxPz nz-(Vf ) = 0, (10) 

where the last equality holds because Xo fll' c C Xo. Also, note that Xd HX /C C X /C . Then 
we have the following: 

X = XZ' + C => Pr ni-(^) = ^Xonx' c (^') 

=► PzorcrHVx) = ^Zonz-(^) 
=► VjfPjinp. (V£) = Vx V£ Pxonx' <= 

Recalling © and (fTU|) . then we have 

V r Vo r yxPx nx-(Vf ) = =► Fo^fVxVf Vw(Z') = 

^V XonI ,c(Z') = 0, (11) 
where the last equality is from the conclusion of Z' = VqVq Z' '. By X = Xq + Co, 

^r nz' c (Co) = 'Pxonx" : (^ - X ) = Vx Q nx /c (X). 
Notice that V Xo r\X' c {X) = XV Xo ra' c {Z'). Then by CD]), we have 

^x nx' c (C ) = 0, and so X nX /c = 0. 



4.3 Dual Conditions 

To prove that LRR can exactly recover the row space and column support, Theorem [2] 
suggests us to prove that the pair (Z',C') is a solution to ([2]), and every solution to ([2]) 
also satisfies the two constraints in Theorem [2j To this end, we write down the optimal 
conditions of ([2]) , resulting in the dual conditions for ensuring the exactness of LRR. 

At first, we define two operators that are closely related to the subgradient of ||C'||2,i 
and ll-Z'IL. 
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Definition 4 1. Let (Z',C) satisfy X = XZ' + C , V^ (Z') = Z' and V Xo (C) = C . 
We define the following: 

Observe that 0(C) is a column-normalized version of C . 
2. Let the SVD of Z 1 as U'T,'V' T , we further define the operator Vt(Z') as 

v T( z>){-) = 7V(-)+7V(-)-WV(-) 
= K(-)+vv>(-)-Kvv>(-). 

Next, we present and prove the dual conditions for exactly recovering the row space and 
column support of Xq and Co, respectively. 

Theorem 3 Let (Z',C) satisfy X = XZ' + C , V^{Z') = Z' and Vx (C) = C . Then 
(Z 1 , C) is an optimal solution to ^ if there exists a matrix Q that satisfies 

(a) V T(Z , ) (X T Q) = U'V' T , 

(b) \\V nz/) x(X T Q)\\<l, 

(c) Vx (Q) = XB(C), 

(d) ||7VQ)|| 2 ,oo<A. 

Further, ifVx a flPy' = {0}> then any optimal solution to (J2j) will have the exact row space 
and column support. 

Proof By standard convexity arguments ( Rockafellar . 197dl ). a feasible pair (Z',C r ) is an 
optimal solution to ([2]) if there exists Q' such that 



Q' G d\\Z'\U and Q' £ AX T a||C"| 



2,1- 



Note that (a) and (b) imply that X T Q € ||* . Furthermore, letting X' be the column 
support of C , then by Theorem [21 we have X' = Xq. Therefore (c) and (d) imply that 
Q e A<9||C"|| 2 ,i, and so X T Q G AX T <9||C'|| 2 ,i. Thus, (Z',C) is an optimal solution to J2J. 

Notice that the LRR problem ([2]) may have multiple solutions. For any fixed A ^ 0, 
assume that (Z 1 + Ai, C - A) is also optimal. Then by X = X(Z' + Ai) + (C - A) = 
XZ' + C, we have 

A = XAi. 

By the well-known duality between operator norm and nuclear norm, there exists Wq that 
satisfies ||W || = 1 and (Wo, V T (Z')^- ( A i)> = WPt(z')± ( A l)\U- Let w = 'Pt(Z')^( w o), then 
we have that ||W|| < 1, (W, 'Pt(z')- l ( A 0) = II^'t(Z') ± ( A i)II* and ^T(Z')( W ) = °- Let F be 
such that 

[F] l = l -H&' if and [A],/ 0, 

0, otherwise. 
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Then V T{ZI) {X T Q) + W is a subgradient of ||^'||*, and Vx (Q)/\ + F is a subgradient of 
|| C || 2,1- By the convexity of nuclear norm and £2,1 norm, we have 

HZ' + AjH, + X\\C- A|| 2 ,i 

> ||L'||* + A||C|| 2 ,i + {V T{ZI) {X T Q) + W, Ax) - \(Vzo(Q)/\ + F, A) 

= ||L'||* + A||C|| 2)1 + \\V n z>)±{*i)\\. + A||^(A)|| 2l i + {V T{ z>){X T Q), A,) - (V Xo (Q), A). 

Notice that 

(P T( xo(^ T Q),A 1 )-(Pzo(Q),A) 

= (X T Q - p T(z , )± (X T Q), Ai) - (Q - Vx S (Q),A) 

= ( ~ V T{ZI) ±(X T Q),A 1 ) + CPzc(Q), A) + (Q, XA 1 - A) 

= ( -V T{ZI) ^(X T Q), Ax) + <Pi S (Q), A) 



> 



r T( ^ ) x(X T Q)||||P T( ^ ) x(A 1 )||, - ||P 2 g(Q)|| 2 ,oo||^2 S (A)|| 2il , 



where the last inequality is from Lemma [7J and the well-known conclusion that |(MiV)| < 
||Af||||JV||* holds for any matrices M and N. 
The above deductions have proven that 

\\Z' + Ai|U + A||C - A|| 2 ,i > ||L'||, + A||C'|| 2)1 + (1 - \\V T{z ,y(X T Q)\\)\\V T{z ^(A 1 )\U 

+ (A-||Px S (Q)|| 2 ,oo)||^ S (A)|| 2 , 1 . 

However, since both (Z' , C) and (Z' + Ax, C — A) are optimal to we must have 

\\Z' + Axil, + A||C" - A|| 2)1 = ||L'||* + A||C"|| 2j x, 

and so 

(1 - WP T ( Z ^(X T Q)\\)\\V nzf) 4^i)\\* + (A - ||Pi=(Q)||2,oc)||Pxg(A)|| 2) i < 0. 
Recalling the conditions (b) and (d), then we have 

r T( ^( A i)ii* = rz S (A)ii 2i i = o ! 

i.e., V T (Z>)( A l) = A i and ^To( A ) = A - % Lemma[H 

Z' G p£ x , Z + Ax € 7?£ x and so Ai e 7?£ x . 

Also, notice that A = X Ax- Thus, we have 

Prg(A) = ^ XP X c(A 1 ) = 
VfPig(A 1 ) = 
V^ X V XS (A 1 )=0 
=> Vx S (V^ x (A 1 )) = 
Vx S (A 1 ) = 0, 
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which implies that 7'i (Ai) = A±. Furthermore, we have 

Pr (A!) = Ai = P T(Z /)(Ai) = Pu> (Ai) +V V >V U ,±(A 1 ) 
= Pc/'(^o( A i)) +VvV ul x{A 1 ) 
= Vx^VwiAx) + VyVu^iAx) 
VxoV u ,x{Ax)=V V :V u ,x{A l ). 

Since Vx Vjj/± (Ai) = ^^(Ai), the above result implies that 

^(Ai) eVx r\V v >. 

By the assumption of "P:r D "Py = {0}, we have Vjji±(Ai) = 0. Recalling Theorem [21 we 
have that V{j> = *Py , and so Ai G T 3 ^. Thus, the solution (Z' + Ai, C" — A) also satisfies 
X = X(Z' + Ai) + (C - A), 7?£ q (Z' + Ai) = Z' + Ai and Px (C - A) = C - A. Recalling 
Theorem [2] again, it can be concluded that the solution (Z ; + Ai,C' — A) also exactly re- 
covers the row space and column support, i.e., all possible solutions to ([2]) equally produce 
the exact recovery. ■ 



4.4 Obtaining Dual Certificates 

In this section, we complete the proof of Theorem 1 by constructing a matrix Q that 
satisfies the conditions in Theorem [3l and proving Vx H Vy = {0} as well. This is done 
by considering an alternate optimization problem, often called the "oracle problem". The 
oracle problem arises by imposing the equivalent conditions as additional constraints in ([2]): 

Oracle Problem: min IIZIL + A||C|| 2 1 (12) 

X = XZ + C, (Z) = Z, V Xo (C) = c. 

Note that the above problem is always feasible, as {VqVq \Cq) is a feasible solution. Thus, 
an optimal solution, denoted as (Z, C), exists. Observe that because of the two additional 
constraints, (Z, C) satisfies ([7]). Therefore, to show Theorem [1] holds, it suffices to show 
that (Z, C) is the optimal solution to LRR. With this perspective, we next show that (Z, C) 
is an optimal solution to ([2]), and obtain the dual certificates by the optimal conditions of 

In the rest of the paper, we need to use the following two notations: Ultv T is the SVD 
of Z, and X is the column support of C. 

Lemma 8 There exists an orthonormal matrix V such that 

VV T = VV T . 

In addition, 

= K(-)+v v (-)-r^r v (-). 
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Proof By Theorem^ we have UU T = V V?. Let V = VU T Vq, then we have UV T = V V T . 
Note that UU T = V V^ leads to Vjj = Vy Q , and VV T = VV T leads to V v = V V , so the 
second claim follows. ■ 



Lemma 9 Let H = 13(C) , then we have 

v v Xo (v T ) = xv{; a (x T H). 

Proof Notice that the Lagrange dual function of the oracle problem (|12p is 

C(Z,C,Y,Yx,Y 2 ) = \\Z\U + X\\C\\ 2 ,i + (Y,X-XZ-C) 
+(Y u V Vo {Z) -Z) + (Y 2 ,V Xo (C) - C), 



where Y, Y\ and Y 2 are Lagrange multipliers. Since (Z,C) is a solution to problem (fT2|) . 
we have 

£ dCz(Z, C,Y,Yi,Y 2 ) and G dCc(Z, C, Y, Y\, Y 2 ). 
Hence, there exists W, H and L such that 

v f {w) = o, \\w\\ < i, v v T + we d\\z\\*, 

H = B(C),V Xo (L) = 0, \\L\\ 2>O0 <1,H + Le d\\C\\ 2>1 , 
V V T + W- X T Y - V$.±(Yi) = 0, 

\(H + L)-Y-Vz§(Y 2 ) = 0. 

Let A = W — Y\ and B = XL — Y 2 , then the last two equations above imply that 

V V T + T$±(A) = XX T H + V X o(X T B). (13) 

Furthermore, we have 

V^ Vx (V V T + V%x (A)) = V^Vzo (V V T ) + V^ Vx Q V^ (A) 

= V G Vx Q {V T ) + V^V^Vx Q {A) 
= V V Xo (V T ). (14) 



Similarly, we have 

V^ Vx (XX T H + V XS (X T B)) = V^V Xo (XX T H) + V^Vx Vx S (X T B) 

= V^ Q Vx {XX T H) = XV^ (X T Vx (H)) 
= XV^{X T H). (15) 
Combing (fl~3|) . (JTIJ) and (fT5|) together, we have 

V Vxo(V T ) = \V& (X T H). 



Before constructing a matrix Q that satisfies the conditions in Theorem [3j we shall prove 
that Vx H Vy = {0} can be satisfied by choosing appropriate parameter A. 
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Definition 5 Recalling the definition of V , define matrix G as 

G±V Xo (V T )(V Xo (V T )) T . 

Then we have 

g = J2^ T U[v T ]if 4 £^ T k([^ T k) T = v T v = I, 

where ^ is the generalized inequality induced by the positive semi-definite cone. Hence, 
\\G\\ < 1. 

The following lemma states that ||G|| can be far away from 1 by choosing appropriate A. 
Lemma 10 Let tp = \\G\\, then tp < A 2 ||X|| 2 7n. 
Proof Notice that 

^ = ll^o(^ T )(^o(^ T )) T || = ||^Xo(^ T )(^ (F T )) T ^o T |l 

= \\(v r Xo (v T ))(v v Xo (v T )) T \\. 

By Lemma O we have 

V = \\XV^(X T H){\V^{X T H)) T \\ 
= \ 2 \\V^{X T H)(V^{X T H)) T \\ 

< \ 2 \\v^(x t h)\\\\(v{; (x t h)) t \\ 

< X 2 \\X T H\\ 2 < X 2 \\X\\ 2 \\H\\ 2 

< A 2 ||X|| 2 |X | = A 2 ||X|| 2 7 n, 

where ||.ff|| 2 < |2"o| = l n is due to LemmaEJ ■ 

The above lemma bounds ip far way from 1. In particular, for A < we have ip < ^. 

So we can assume that tp < 1 in sequel. 

Lemma 11 Ifip< I, then V y n V Xo = V v n V Xo = {0}. 

Proof Let M G Vy D V Xo , then we have 

||M|| 2 = \\MM T \\ = \\V Xq (M)(V Xo (M)) t \\ = \\V Xo (MVV T )(V Xo (MVV T )) T \\ 
= \\MVV Xo (V T )(V Xo (V T )) T V T M T \\ 

< \\M\\ 2 \\VV Xo (V T )(V Xo (V T )) T V T \\ = \\M\\ 2 \\V Xo (V t )(V Xq (V t )) t \\ = \\M\\ 2 *P 

< \\M\\ 2 . 

Since ip < 1, the last equality can hold only if ||M|| = 0, and hence M = 0. Also, note that 
Vy = Vy, which completes the proof. ■ 

The following lemma plays a key role in constructing Q that satisfies the conditions in 
Theorem [3J 



17 



Liu, Xu and Yan 



Lemma 12 If tp < 1, then the operator VyVx^Py is an injection from Vy to Vy, and its 
inverse operator is I + 'Y^ = i(J > yVx Q 'Py)' 1 . 

Proof For any matrix M such that ||M|| = 1, we have 

VyV Xo Vy{M) = VyVx (MVV T ) 
= Vy(MVV Xo (V T )) 
= MVVx (V T )VV T 
= MV(V Xo (V T )V)V T 
= MV(Vx (V T )(Vx (V T )) T )V T 
= MVGV T , 

which leads to \\VyVx Vy\\ < \\G\\ = ip. Since tp < 1, I + Y^x^v^Vyf is wel1 defined, 
and has a spectral norm not larger than 1/(1 — ip). 
Note that 

VyVx^Py =Vy{I~ V Xo )Vy =Vy(l- VyV Xo Vy) , 

thus for any M £ Vy the following holds 

oo oo 

VyVx S Vy(l + ^2(VyVx Vyy)(M) = Vy(l - VyVx Vy)(l + Y^VvV^VyY^M) 



i=l 



Vy(M) = M. 



Lemma 13 We have 



ll^(^ T )|| 2 ,oo < 



(1 - i)n 

Proof Notice that X = XZ + C and V X -(X) = X = V X -{X ). Then we have 
X = XZ + C Vxc(X ) = XVx*(Z) 

=> v Q T = Vx § (v T ) = ^ 1 u£xu£vxc(v T ), 

which implies that the rows of Vxg(V T ) span the rows of Vq . However, the rank of Vx§{V T ) 
is at most ro (this is because the rank of both U and V is ro). Thus, it can be concluded 
that Vx^(y T ) is of full row rank. At the same time, we have 

4Vx S (V T )(Vx S (V T )) T 4 1- 
So, there exists a symmetric, invertible matrix Y € ]R r ° xro such that 

||Y|| < 1 and Y 2 = Vx S (V T )(Vx S (V T )) T . 
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This in turn implies that Y l Vx^(V T ) has orthonomal rows. Since Vx§(Vq) = Vq is also 
row orthonomal, it can be concluded that there exists a row orthonomal matrix R such that 

Y- 1 Vx S (V T ) = RVx S (V T ). 

Then we have 

\\Vx S (V T )h,oo = \\YRVx s (V^)h,oo 

rT\\\ - II o-n /t /-T 



< ||F||||i?P Z e(F J )|| 2>oo < WRPx^V^h, 

< ||^(^0 T )||2,oo 
< 



(1 - 7)71' 

where the last inequality is from the definition of fi. 
By the definition of V, we further have 

||^(F T )|| 2 ,oo = \\Vx S (V T UV T )\\ 2>oc = \\V T UVx 5 (V T )hoo < \\Vxc(V T )\\ 2 , c 

Hr 



(1-7)"' 



Now we define Q\ and Q2 used to construct the matrix Q that satisfies the conditions 
in Theorem [3l 

Definition 6 Define Q\ and Q2 as follows: 



Qi = xv(; a (x T H) = v Vx (v T ), 

00 

Q2 = xr^rx S ry(i + ^2(ryVx ryy)ry(x T H) 

1=1 

00 

= XVx S Vy(l + ^2(VyVx Vyy)VyV^(X T H), 
i=l 

where the equalities are due to Lemma\Q and Lemma^ 

The following Theorem almost finishes the proof of Theorem [TJ 

Theorem 4 Let the SVD of the dictionary matrix X as Ux^xVx '• Assume tp < 1. Let 



and 



Q 4 U X V?V% (V V T + XX T H - Qi - Q 2 ). 



7 < P 2 (i-^y 



1-7 (3-</> + /3)Vo : 



7 . 1 — t/j 

< A < 



then Q satisfies the conditions in Theorem^ i.e., it is the dual certificate. 
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Proof By Lemma [TT| it is concluded that tp < 1 can ensure that Vy n Vx = {0}- Hence 
it is sufficient to show that Q simultaneously satisfies 

(51) Vjy(X T Q) = UV T , 

(52) V V (X T Q) = UV T , 

(53) V Xo (Q) = XH, 

(54) \\V f (X T Q)\\ <1, 

(55) ||Px S (Q)|| 2 ,oo < A. 

We prove that each of these five conditions holds, in S1-S5. Then in S6, we show that the 
condition on A is not vacuous, i.e., the lower bound is strictly less than the upper bound. 

First of all, we shall simplify the formula of X T Q that will be used several times in 
the following process. Recalling the setting © that assumes Vy (Vq) = Vq, we have that 
V$ x (Qi) = Qi and 

oo 

Vb x (Q 2 ) = \VioV v {l + Y,(nVx Vy) i )V v V^ x V^{X T H) 

i=l 
oo 

= XV X cVy(l + ^{VyVx.VyfyPyV^ ~ V V Q T )X T H) 

i=l 

oo 

= xrx S r v (i + Y.iVyVx.Vy^Vyil - V V?)X T H) 

i=l 
oo 

= xrx S ry(i + ^2(ryVx v v y)VyV^(x T H) 

i=l 

= Q 2 . 

Further, we have 

X T Q = V X V£(V V T + XX T H-Q 1 -Q 2 )=V& X (V V T + XX T H-Q 1 -Q 2 ) 
= Vk x (VoV T ) + XV^ X (X T H) - V^ x (Qi) - P$ x (Q 2 ) 
= V V T + XX T H - V^ x (Qi) - V% x (Q 2 ) 
= V V T + XX T H-Q 1 -V^ X (Q 2 ) 
= V V T + XX T H - Qi - Q 2 . 

SI: Note that V$ (Qi) = XVy (X T H) and V^ {Q 2 ) = 0. Thus we have 

T t (X T Q) = V (j (V V T + XX T H-Q 1 -Q 2 ) 
= Vv (V V T + XX T H - Q ± - Q 2 ) 
= V V T + XV^ (X T H)-V^(Q 1 )-V^(Q 2 ) 

= v v T -r^ (Q 2 ) 
= v v T = uv T . 
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S2: First note that 

oo 

Vy(Q 2 ) = XV v VT^V v {l + Y,^ P vVx Vv) i )VvV^{X T H) 
= WyV^{X T H), 

which is from that the operator VyVx^Py is an injection from Vy to Vy, and its 
inverse is given by I + Y^i(Pv'Pt 'P v ) % ■ 

Thus we have 

V V (X T Q) = V V (V V T + XX T H-Q 1 -Q 2 ) 

rT i \ vT j 



VyiVoV 1 +\X 1 H-Q 1 - Q 2 



V V T + XP V (X T H) - XP v Vh{X T H) - V V (Q 2 ) 



V V T + XV V V^(X T H) - V V (Q 2 ) 



r 

rT jtCtT 



VqV 1 = uv 



S3: We have 



Vx Q (Q) = V Xo (U x Y,- x l v£{V Q V T + \X T H-Q l -Q 2 )) 

= Ux^V^VoVxoiV 7 ) + XUxU^Vx (H) - U x Y Ix 1 V^Vx {Qi) 

= Ux^V^VoVx^) + XUxUlH - Ux^ x 1 V^Vx a {Qx) 

= U x ^x lv xVoVx (V T ) + XUxUlH - Ux^V^VqT^ (V t ) 



XUxUlH = XPb{H). 



By C = X(I — Z), we have that C £ Vu x and so 

H = B{C) eV Ux , 
which finishes the proof of Vx (Q) = XH. 
S4: Since V f± (V V T ) = Vfx{Qi) = 0, we have 

V f± (X T Q) = V f± {V V T + XX T H-Q l -Q 2 ) 

oo 

= XP V ^{X T H) - XV^Vy^V v {I + ^CPv^vT^vi^H) 



i=l 



First, it can be calculated that 



\V V± V^(X T H)\\ < \\X T H\\ < \\X\\\\H\\ < \\X\\y/rFi, 



where \\H\\ < y/yn is due to Lemma [5l 
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Second, we have the following 

oo 

\\V^Vy^Vy{l + Y J {Tv'Px Q Vy) i )Vy{X T H) 
i=l 

oo 

< \\rx 5 ry(i + Y,('Pv'Px Vyy)Vy(x T H)\\ 



1=1 

oo 

< 11(1 + ^v^vf^v^H) 

i=i 

1 



< ll^llv 7 ^ 



l - v 

Thus we have that 



\\V f± (X T Q)\\<l 4= A(||X||V7^+H^)<1 

l -V 



A < 



11*11^(2-1/0' 



S5: Note that Vx^(X T H) = "Pjg(Qi) = 0. So we only need to bound the rest two parts. 
By Lemma [71 we have 

WVx^Ux^V^V^h^ = \\U x E^V^V Vz § (V T )\\ %oo 

<||c/xS x VJyollll^ s (^ T )l| 2 ,oo 

= ||S x 1 yJyo||||^(^ T )|| 2 ,oo 

<^il 11^)11^ 

-/3||X||V(l-7)n' {Lb) 

where HS^Vj'Voll < ^jxiy is due the definition of (3, and the last inequality is due to 
Lemma [T3l 

We expand Q2 for convenience: 

00 

Q 2 = XVx 5 Vy(l+^2(VyVx Vyy)VyV^(X T H) 
i=l 

oo 

= A(I - V V^)(X T H)VV T (I + VG i V T )VVj^{V T ). 



i=l 
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Write Q 2 = A(Q 2 - Q 2 ), with 



Q 2 = X T HVV T (l + J2VG i V T )VVxs{V T ), 

t=i 

oo 

Q2 = v v^x T Hvv T {i+Y,yG i y T )y^ ) {v T ). 



i=l 



Then we have 



\\Vx S (U x ^ 1 V^Q 2 )h, 



< 



< 



< 



Ux^V^Vx^hoo 

00 

U x UxHVV T {! + VG i V T )VVx-{V' 1 



2,00 



i=i 



i?W T (I VG i V T )VVi § (V T )\\2, 

i=l 

00 

F||||yy T (l + ^2VG i V T )VVx S {V 1 
t=i 

00 

F||||yy T ||||(i + ^wy T )||||y||||^(^)|| 2)l 



2,00 



i=l 



■0 y (1 — 7)71 



1 



1 — xj; y 1 — 7 



(17) 



and 



||Pi g (C/ x S^Vi'Q 2 )|| 2>00 = WUxYgvZVxziQ*)^ 

00 

= ||C7 x s-V^VbVb T x T wv- T (i + Y, VGlvT ) vr ^( vT )hoo 

i=l 

00 

< ll^v^^ollllv^^iiii^iiii^^iMKi + 5^ v-^^)||||^||||^(^)|| 25C 
1 



i=l 



011*11 
1 



1 



1 — %/; y (1 — 7)^ 



/3(1-V)V 1-7 



7 



-/xr . 



(18) 
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Combing (fl~6j) . (fT7|) and (fTHj) together, we have 



+ X\\Vx 5 (Ux^x 1 V^Q 2 )h !O0 

<r 1 / Wo x I 7 A / 7 

" /3||^|| V (1 " 7)n + (1 - ^) V 1 " 7^ r ° + /3(1 - ^ V 1 " 7^° 



i / m , a(i + /3) 



+ ~~, 7\\l-i /" r o 



X||V(l-7)n /3(l-V)V 1 -7 



Hence, 



||%g(0)lkoo<A 



1 / pro , A(l + /3) 



/ A^o 


(1 


- 7 )n 


/ /^o 


(1 


- 7 )n 



+ 1571 7\ \ i A* r o < A 



1 / ^5 1 + /3 r^y . 

<A(1 -^cw)V^ ro) 



w r ^^<A(/ 3 (i-^)-(i + / 3)A /^;) 



||X|| V (l-7)n v v ' v 1-7 
A > 



- V) - (1 + P)y/^T ) ' 



as long as /3(1 — ^) — (1 + /3) w t^— z^ro > 0, which is proven in the following step. 



l- 7 

S6: We have shown that each of the 5 conditions hold. Finally, we show that the bounds on 
A can be satisfied. But this amounts to a condition on the outlier fraction 7. Indeed, 
we have 



1 - v 

< 



1*110*03(1 - </,) - (1 + {3) J ^ pro) ll*llv^(2 " VO^T" 
i= (2 - ^^j^L-fjLTo < [3(1 - ijj) - (1 + p)^^. fj.ro 
. 7 < /3 2 (1-^) 2 



1-7 (3-V + /3)W 

which can be satisfied, since the right hand side does not depends on 7. Moreover, 
this condition also ensures f3(l — ifi) — (l + /9)w jz~fJ.ro > 0. 



We have thus shown that if ^ < 1 and A is within the given bounds, we can construct a 
dual certificate. From here, the following lemma immediately establishes our main result, 
Theorem [TJ 
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Lemma 14 Let 7* be such that 

7* 324/3 2 



1-7* 49(11 + 4/3) 2 ^o' 
i/ien LRR, with A = y|^yjj^== ; strictly succeeds as long as 7 < 7* 
Proof First note that 

324/? 2 36 /3 2 (1 



49(11 + 4/3) Vq 49(3- i + /3)Vo' 

Lemma [10] implies that as long as 7 < 7* we have the following: 

9,, ,,9 97 9 1 

Y ~ " 11 ' 497* ~ 49 4 

Hence, we have 

/5 2 (1-^) 2 > /3 2 (1-|) 2 



+ (3-±+/3)Vo 



7* < 36 /? 2 (l-# 



1-7* 49 (3 - V + P) 2 nro 
36 /3 2 (l-^) 2 (l-7*) 
49 (3 - V + /3) 2 7* 



(1—0) / Air ° 

Note that (1 ~ 7) , as a function of . / -r-urn is strictly increasing. More- 

||X|j v ^(/3(l-^)-(l+/3),/ I i-^o) V 



over, A / 7=^0 < ' and thus 



(1 - VO^f^ (i - V0y^(3 - V + 0) 



|X||^(/3(1 - V) " (1 + fiyj^m) WIVW " ^)(2 " VO 

< 11X11^/3(1-^(2-^) 



' 1-7* 
-7 



< 



< 



|X||Vrn(2 - ^) 



5(1 -V) 



|X||V7^(2- V) 
3 



7||X||vW 



where the last inequality holds because tp > 0. 
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0.4 0.6 
outlier fraction 7 



0.2 0.25 
parameter X 



0.35 



Figure 3: The success rates obtained from 50 random trials, (a) When A = 0.2, the success 
rates obtained under various settings of the outlier fraction 7. (b) When the 
outlier fraction is fixed to be 7 = 0.5, plotting the success rate as a function of 
the parameter A. In these experiments, the "success" is measured in terms of 
exact recovery, i.e., U*(U*) T = VqVq and 1* = 1 . 



By ijj < 1/4, we also have 
l-i/) 



> 



\-i> 



> 



i-i 



XWyfynV ~ VO " \\X\\y/rE(2 - ^ \\X\\y/^i(2 - ±) 7||X||vW 

always satisfies the given bounds, as long as the outlier fraction 7 is 



Hence, A = 7M vl f rsr- 
not higher than 7*. 



5. Experiments 



We present some numerical experiment results in this section. Our main goal is to validate 
the theoretical results obtained in previous section, not to verify the effectiveness of the 
LRR algorithm itself. For the latter, we refer the readers to the numerous works that use 
LRR to achieve state-of-the-art performances in applicati ons including motion s egmentation 



(ILiu et all l2010al ; iLiu and Yanl.l201ll;lFavaro et al.l . l201ll ) , image se gmentation (IChen et al 



2011 



saliency detection (jLang et al 



2011 



and face recognition (|Liu and Yanl . 



2011 



5.1 Results on Synthetic Data 

Theorem [T] states that there exists a parameter A such that LRR can work well when the 
outlier fraction is not larger than a certain threshold. To explore this, we construct 5 
pairwise disjoint subspaces {Sj}f =1 whose bases {Ui}\ =l G ]R 500 are computed by fTj+i = 
TUi, 1 < i < 4, where T is a random rotation and U\ is a random orthonomal matrix of 
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Figure 4: Examples of the images in the Yale-Caltech dataset. 



dimension 500 x 5. Therefore, each subspace is a 5-dimensional subspace of ]R 5 . From 
each subspace Si, we sample 40 data samples by Xi = UiRi, 1 < i < 5 where i2j is a 5 x 40 
matrix with each entry uniformly distributed in [—1, 1]. We then construct the sample 
matrix Xq as Xq = [X\, ■ ■ ■ , X$]. Some outliers are randomly generated from zero mean 
Gaussian distribution with standard deviation s, where s is set to be the average absolute 
value of the samples, to ensure that the samples and outliers have comparable magnitudes. 
Fixing all the other configurations, we change the number of outliers and the parameter 
A, and observe whether LRR succeeds or not. More precisely, we claim LRR succeeds 
if Equation holds with a tolerance of 0.01%, i.e., \\U*(U*) T - V Vf\\ < 10" 4 (i.e., 
U*(U*) T = V Q V£), andX* =1 with 1* = {i : ||[C*]i|| 2 > 10- 4 ||[X];|| 2 }. Figure^a) shows 
that LRR succeeds when 7 is smaller than a threshold (0.6 in this example), with sharp 
phase-transition observed. In addition, Figure E^b) illustrates that there exists a parameter 
range for obtaining exact recovery. These results are consistent with the statements in 
Theorem [TJ 



5.2 Results on Real Data 



5.2.1 Datasets 

To test LRR's effectiveness in the presence of outliers and noise, we create a data set, which 



we call " Yale-Caltech" , by combing Extended Yale Database B (|Lee et al.l . 120051 ) and Cal 



techlOl (|Li et all EooI l For Extended Yale Database B, we remove the images pictured 
under extreme light conditions, i.e., we only use the images with view directions smaller 
than 45 degrees and light source directions smaller than 60 degrees, resulting in 1204 au- 
thentic samples drawn (approximately) from a union of 38 low-rank subspaces (each face 
class corresponds to a subspace). For CaltechlOl, we only select the classes containing no 
more than 40 images, resulting in 609 non-face outliers. Figj4] shows some examples of this 
dataset. 



5.2.2 Evaluation Metrics 

Segmentation Accuracy (ACC): The segmentation results can be evaluated in a sim- 
ilar way as classification results. However, as segmentation does not provide label to each 
cluster, we postprocess the result to assign each cluster a label: given the ground truth clas- 
sification results, the label of a cluster is the index of the ground truth class that contributes 
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Table 1: Segmentation accuracy (ACC) and AUC comparison on the Yale-Caltech dataset. 





PCA 


RPCAi 


RPCA 2 ,i 


LRR 


ACC (%) 


77.15 


82.97 


83.72 


86.13 


AUC 


0.9653 


0.9819 


0.9863 


0.9927 



the maximum number of samples to the cluster. We then compute the segmentation accu- 
racy (ACC) as the percentage of correctly classified samples. 

Areas Under Curve (AUC): Recall that as shown in Theorem Q3 the optimal solu- 
tion C* is column sparse, and can be used to detect the outliers in data. In the noiseless 
case, this can be done by simply identifying all the nonzero columns of C* . In the noisy 
cases, however, C* is only approximately column-sparse, and we have to threshold. That is, 
the i-th data vector of X is considered to be outlier if and only if 

\\[C*]ih>6, 

where 5 > is a parameter. To evaluate the effectiveness of outlier detection without 
choosing a parameter 5, we consider the receiver operator characteristic (ROC) that is 
widely used to evaluate the performance of binary classifiers. The ROC curve is obtained 
by trying all possible thresholding values, and for each value, plotting the true positives 
rate on the Y-axis against the false positive rate value on the X-axis. We use the areas 
under the ROC curve, known as AUC, to evaluate the quality of outlier detection. Note 
that AUC score ranges between and 1, and larger AUC score means more precise outlier 
detection. 



5.2.3 Results 

The goal of this test is to identify 609 non-face outliers and segment the rest 1204 face 
images into 38 clusters. The performance of segmentation and outlier detection is evaluated 
by ACC and AUC, respectively. While investigating segmentation performance, the affinity 
matrix is computed from all images, including both the face images and non-face outliers. 
Note here that the computation of ACC does not involve the outliers, as we need to clearly 
explore the segmentation aspect of LRR. 

We resize all images into 20 x 20 pixels and form a data m atrix X of size 400x 1813. Table 
□ shows the results o f stand ard PCA, RPCAi proposed in ICandes etaD ()2009h . RPCA 



ij snows tne results o i stanu aru r^A, rtr'v^Ai proposed m i fancies et ai.i i|zuuau . nr^A2,i 
proposed in lXu et al.l ( 2010l ) and LRR. Table [T] shows that LRR achieves best performance 



among all methods, both for subspace segmentation and for outlier detection. We believe 
that the advantages of LRR, in terms of subspace segmentation, are mainly due to the fact 
that it directly targets on recovering the row space VqVq ', which is known to determine the 
correct segmentation. In contrast, PCA and RPCA methods are designed for recovering 
the column space UqUq , which is designed for dimension reduction. In terms of outlier 
detection, the advantages of LRR are due to the fact that this dataset has a structure of 
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multiple subspaces, while PCA and RPCA methods are designed for the case where data 
come from a single subspace. 

6. Conclusion 

This paper studies the problem of subspace segmentation in the presence of outliers. We 
analyzed a convex formulation termed LRR, and showed that the optimal solution exactly 
recovers the row space of the authentic data and identifies the outliers. Since the row space 
determines the segmentation of data, LRR can perform subspace segmentation and outlier 
identification simultaneously. 

The analysi s presented in this paper differs from previous work (e.g., Candes et al. . 20091 ; 



Xu et all libioh largely due to the fact that the dictionary used in ([2]) is the data m atrix 



A, as opposed to the (arguably easier) identity matrix I used in ICandes et all (|200^ and 



Xu et al.1 (|2O10h . As a future direction, it is interesting to investigate whether the technique 



presented can be extended to general dictionary matrices other than A or I. 

Appendix A. Proofs 
A.l Proof of Lemma [2] 

Proof Suppose the SVD of Ao is C/oSoV T , and the SVD of Co is Uc^cVq • Suppose Uq 
and Ujj are the orthogonal complements of Uq and Uc, respectively. By the independence 
between span (Co) and span(Ao), [Uq ,Uq] spans the whole ambient space, and thus the 
following linear equation system has feasible solutions Yq and Yq: 

U ± (U ± ) T Y + u£(u£) T Yc = I. 

Let Y = I — Uq(Uq) t Yq, then it can be computed that 

A r Y = A r and C%Y = 0, 

i.e., A = VA and YC = are feasible. By P zg (A) = A , V Xo {X) = C , Pr (A ) = A 
and Vzc(Xq) = 0, the following linear equation system has feasible solutions Y: 



which simply leads to Vq £ V L 



A = YX, 

Vx- 



A. 2 Proof of Lemma [3] 

Proof Suppose U x ^xV% is the SVD of A, UqEqV^ is the SVD of A , U c is the column 
space of Co, and XJq is the orthogonal complement of Uc- By A = Ao + Co, {U^) T X = 
(U^) T X and thus 

(Uc-) T U X Z X V£ = (Uc-) T U ZoV T , 
from which it can be deduced that 

(U^) T U X = (U^fUo^o^Vx^ 1 ). 
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Since span (Co) and span(Ao) are independent to each other, (Uq) t Uq is of full column 
rank. Let the SVD of (U£) T U be U^V? , then we have 

Hence, 



\^ l Vi^ l u^{u^) T u x \\ < lis^mis^ 1 ! 



1 



(x )sm(ey 



where ||E 1 1 || = l/sin(#) is concluded from ( Knyazev et al. . 20021 ) . By ||X|| < ||Xo|| + ||Co||, 
we further have 



IIE^y^vbimxi 

sin(6>) 
CO nd(Xo)(l + g) 



(X ) sin(0) (X )sm(0) 



X 



\Xq\\ + Co 



Appendix B. List of Notations 



X The observed data matrix. 

Xq The ground truth of the data matrix. 

Co The ground truth of the outliers. 

cond(-) The condition number of a matrix. 

d The ambient data dimension, i.e., number of rows of X. 

n The number of data points, i.e., number of columns of X. 

Tq The indices of outliers, i.e., non-zero columns of Co- 

7 Fraction of outliers, which equals |Xo|/n. 

Uo, Vq The left and right singular vectors of Xq. 

fj, Incoherence parameter of Vq. 

j3 RWD parameter of the dictionary X. 

Z, C The optimal solution of the Oracle Problem. 

U, V The left and right singular vectors of Z. 

V An auxiliary matrix defined in Lemma 

B(-) An operator defined in Definition [U 

H An auxiliary matrix defined in Lemma EJ as H = B(C). 

G An auxiliary matrix defined in Definition [5l 

(j) Defined in Lemma [TOl as ip = \\G\\. 
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